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[57] ABSTRACT 

A method and apparatus for monitoring a plurality of data 
processing systems from a monitoring system. The data 
processing systems may be coupled to the monitoring sys- 
tem via a network cloud. When one of the plurality of data 
processing systems experiences a failure, the failure is 
detected at the monitoring system based upon communica- 
tions over the network. The data processing systems may 
each have a service processor directly coupled to the net- 
work cloud. The monitoring system can also be employed to 
monitor the status of the data processing systems, either in 
a manufacture/test environment or in the field. The moni- 
tored status can include an inventory of parts for the data 
processing systems. Each part can be provided with identi- 
fication information that is readable by the data processing 
system when the part is installed, and the identification 
information can be used to automatically generate an inven- 
tory of parts for each of the data processing systems. The 
monitoring system can also be used to automatically down- 
load an updated a piece of software to the data processing 
systems. In one aspect of the invention, bidirectional com- 
munication is employed between the monitoring system and 
the data processing systems. When an event occurs on the 
data processing system, the system sends a service request 
to the monitoring system notifying it of the event. The 
monitoring system also sends periodic communications to 
the data processing systems to ensure that each is function- 
ing properly. 

73 Claims, 7 Drawing Sheets 
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METHOD AND APPARATUS FOR 
MONITORING COMPUTER SYSTEMS 
DURING MANUFACTURING, TESTING AND 
IN THE FIELD 

FIELD OF THE INVENTION 

The present invention is directed to the field of 
manufacturing, testing and field service for computer sys- 
tems. 

DISCUSSION OF THE RELATED ART 

The manufacture and testing of a computer or data 
processing system is a complex matter, particularly when a 
large number of systems are being manufactured and tested 
simultaneously. As used herein, the term computer system or 
data processing system includes not only general purpose 
computers, but also other types of computer-related systems 
that include some data processing capabilities. Computer 
systems typically include many components that are inte- 
grated together to form a complete system. An example of 
such a computer system is a storage subsystem that includes 
a plurality of disc drives, such as the SYMMETRIX line of 
disc arrays available from EMC Corporation, Hopkinton, 
Mass. Such a storage subsystem includes a large number of 
disc drives, power supplies and electronic boards that con- 
trol operation of the storage subsystem. The manufacture 
and testing of the storage subsystem is typically a three step 
process. First, each of the components or subassemblies 
(e.g., the boards, drives and power supplies) is tested sepa- 
rately to ensure that they function properly. Next, compo- 
nents that pass the subassembly test are integrated together 
to form completed systems. Finally, system level testing is 
performed to ensure that each system functions properly and 
is ready to be shipped to a customer. 

The goals of an effective manufacturing and test proce- 
dure are many. Obviously, it is desired to have a compre- 
hensive set of tests run at both the subassembly and system 
test levels to detect any failures so that they can be addressed 
prior to shipping the system to the customer. At the system 
level, these tests typically involve the execution of 
application- level programs that are designed to test all 
functional aspects of the system. Such application-level 
programs are executed while the system is subjected to 
numerous environmental conditions (e.g., heat, cold, 
vibration, etc.). Software is typically employed to extract 
information about the execution of the application level 
programs and to record any system failures. When a failure 
occurs during the manufacture/test process, the failure is 
resolved or dispositioned. Depending upon the severity of 
the failure, it can be addressed by various individuals in the 
design/manufacturing operation. For example, simple errors 
may be dispositioned by a technician, with more complex 
errors being addressed by a test engineer, and others requir- 
ing involvement by the system's design engineers. It should 
be appreciated that it is desirable to have each error dispo- 
sitioned at the lowest level necessary, such that design 
engineers are not called in to address problems that could be 
more properly handled by a technician. Thus, typical 
manufacture/test procedures develop a protocol so that each 
failure can be dispositioned at the lowest level possible. 

When a failure occurs during the test process, it is 
desirable to maintain a record of how the failure was 
dispositioned. This can be particularly important in the event 
that a system fails in the field. One goal of the manufacture/ 
test process is to ensure that all errors are detected before the 
system is shipped to the customer. Thus, if a system fails in 
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the field, it is desirable to determine why the testing process 
did not detect the error prior to shipping, and to adapt the 
process so that it can detect similar failures in the future. The 
maintenance of records indicating the manner in which all 
5 errors on a particular system were dispositioned can be 
extremely helpful in determining why a particular failure in 
the field was not detected during the manufacture/test pro- 
cess. 

It should be appreciated that system level testing for a 

10 complex data processing system can be time consuming, 
often taking several weeks per system. It is desirable to 
minimize this time as much as possible. The components 
that make up a complete system can represent a significant 
capital expenditure in inventory. Thus, it is desirable to 

15 reduce the amount of time that the inventory is tied up prior 
to shipment to the customer. 

It should also be appreciated that at any given time, a large 
number of systems can be at different stages in the 
manufacture/test process. Thus, it is desirable for a 

20 manufacture/test procedure to facilitate easy tracking so that 
it can be determined at precisely what stage each system is 
in the manufacture/test process. 

Conventional manufacture/test procedures have typically 

^ involved the use of a number of technicians to manually 
monitor the systems in a test facility to determine where 
each system is in the test process, and to determine when 
errors occur that need to be dispositioned. Each system is 
monitored by the technicians simply taking papers notes. 

30 This procedure is cost intensive in that a number of indi- 
viduals are necessary to monitor a large manufacture/test 
facility. In addition, such a procedure is not conducive to 
sharing information amongst test groups. For example, if a 
particular subsystem caused a failure during system test, that 

3S information is conventionally not easily accessible to those 
individuals in the group responsible for testing that particu- 
lar subassembly, thereby making it difficult to adapt the 
subassembly test to detect failures of that type in the future. 
In more recent advances, paper notes have been replaced 

40 by the test technicians recording the status of each system 
and its failures on a floppy disc. The information can then be 
placed onto a central computer to make it accessible to other 
members of the test organization. However, this procedure 
still suffers from a number of the same disadvantages as the 

45 paper based system. In particular, a number of individuals 
are required to manually walk the test floor to check the 
status of each system under test. In addition, the storage and 
maintenance of the test information is problematic, as the 
floppy discs can consume a large volume of physical storage 

50 space and are susceptible to viruses that can destroy the data. 
More recently, the assignee of the present application has 
employed an electronic monitoring technique to monitor its 
manufacture/test process for the SYMMETRIX line of stor- 
age subsystems. This technique is similar in some respects 

55 to a "call home" technique that has been employed to 
address errors in the field with the SYMMETRIX line of 
storage subsystems. The call home technique involves the 
self-detection by a SYMMETRIX in the field that it is 
experiencing problems, and a notification being automati- 

60 cally transmitted to a customer service center. As will be 
appreciated by those skilled in the art, a SYMMETRIX 
storage subsystem, like many other computer or data pro- 
cessing systems, includes a service processor that can be 
used to control the operation of the system. In association 

65 with the call home feature, the service processor detects 
when a problem is being experienced by the system, and 
utilizes a modem attached thereto to dial into the customer 
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service center. The call from the field either identifies the processor. For example, the SYMMETRIX line of storage 

type of problem being experienced by the system, or the systems employs a lap top computer as the service processor, 

nature of the problem can be identified by a technician at the The service processor can be used to communicate with the 

customer service center who remotely accesses the system. polling machines over the modem/telephone line connec- 

The problem can then, if possible, be addressed remotely, or 5 tion. Each of the polling machines is further coupled to a 

a customer service representative can be dispatched to database 17, so that the information that the polling 

address the problem in the field, often before the customer machines collect relating to the status of the systems under 



is even aware that any problem has been experienced. 



test is stored in a central database associated with the 

^ , . , r . monitor system 5. Thus, the relevant information regarding 

The technique employed in the manufacture/test environ- ^ of tems x 3 under tegt caQ be mQnitored by exa mining 

ment is similar in that a modem is employed to electronically 10 ^ m ^ central mQnitor m 5 ^ 

couple the systems under test to a central monitoring com- ^ stQred ^ a st m ( from ^ SYMMETRIX 

puter system, so that the status of all of the systems under Ufle of & tems) tQ overcome me above-described 

test can be monitored from the central computer. However, problems m me use of floppy discs . 

a modification is made in that the systems under test are not r™ ir „ u - , . , 4 , . * * u* 

. , 4 „ , , A , r t< The polling machines can be implemented m a straight 

required to call home when they experience a test failure. 15 c , r , / 

t-u * *u* a * ** • .u . p t*u ♦ forward manner using any type of computer (e.g., a PC), 

The reason for this modification is that if any of the systems ~ , °, i* , r iL * ? \ * 

. r • vc . 4i_ . Each polhng machine stores a list of the systems 1, 3 that it 

has an error or failure that is too significant, the system may . ..i c « 4 . 4 \ < * 

. , u-i* . «, Hi » . .u . is responsible for polling, as well as the telephone numbers 

not have the abihty to call home to the monitor system. A nun- u- i i * *u u 

™ it- • u ■ u u i j *u * *u of their modems. Each polling machme simply steps through 

Thus, a polling technique has been employed so that the .. , - . .* • *u * ■« • -ui # 

. , r . . « i «_ £ « , on its list, and has certam files that it is responsible for 

central monitoring system polls through each of the systems 20 j 4 j4 * , * a. i * j « r*i. 

. . r . /. . * checking. If the date and time stamp for the last update of the 

in the manufacture/test operation to collect information C1 , \ , . , ... \ „ , f L , . 

j. 4 . . . ' , "v r *i • < i , file has changed since the last tune it was polled, the relevant 

regarding their status and the failures experienced by each . c 4 . ? £1 . , , A 

s stem under test information from the file is read by the polling machme and 

^ * then transferred to the database. 

This prior art periling technique is shown schematicaUy in M should be appredated from the foregoing, if one of the 

FIG .1. Aplurality of systems 1, 3 under test are shown. The m machines experiences a prob i em , it couid result ^ 

number of systems can be any number that is possible and/or some relcvant test info^^ not being passed to me 

convenient to put through the manufacture/test process at database n ^ bdn lost from thc modtor tem 5 ^ 

any particular time. The central monitor system 5 includes a ^ rticularl ^fic^t if on ly a single polling machine is 

Rl U /^Ll P ^ g machmes 7' L 9 ( labekd m FIG * 1 as r 30 employed to monitor a particular file within any of the 

PM1-PMX) The selection of the optimum number of systems 1? 3 under test, becaiise there is no fault tolerance in 

polling machines is a balance between competing factors. such m ^^^^ of the momtor systcm 5. To address 

The use of a single polling machine to monitor all of the ^ ^ centfal mQnitor &tem 5 an event 

relevant information from each of the systems 1, 3 under test monitor 19 ^ is w to ( yk a Qetwork within ^ 

would provide a relatively simple monitor system 5. moni toring system 5) and monitors the status of each of the 

However, the fewer the number of polhng machines, and the m machines PM1 _ PMX to cnsure ^ th contimc to 

more information that each polhng machine is responsible fc , ^ cvcnt monitor caQ bc ^p^^^d m 

for collecting, the longer it will take for a polhng machme a strai ^ t forward fashion b a ocessor (c a pc) that 

to complete a polling cycle through all of the systems 1, 3 simpJy pings each of {hc po]liog machincs> For cxample> 

1111 er test * 40 when the polling machines are implemented by a PC run- 

The polling machines monitor the operation of the sys- tne windows NT operating system, the event monitor 

terns under test by checking the date and time stamps on 19 can simply monitor the windows NT system log to 

particular files, and when a change has occurred since the determine whether the polling machine is continuing to 

last time the file was polled, the updated information is operate. Alternatively, a particular file that should be 

transmitted to the central monitor system 5, where it is 45 updated repeatedly can be monitored, and if it is not updated 

stored in a manner discussed below. If a polhng cycle is too within a particular time period (e.g., 10 minutes), the event 

long, it is possible that a change in data may occur in a monitor determines that the polling machine is not operating 

particular test system that would be overwritten before the properly. When a problem with one of the polling machines 

polling machine had a chance to collect the relevant infor- ^ encountered, the event monitor 19 sends an e-mail and a 

mation. Therefore, it is desirable to configure the monitor 5Q page to a system administrator who can respond to address 

system 5 so that each polling cycle is sufficiently short in the problem. 

relation to the rate at which data is updated in the systems Although the manufacture/test procedure shown in FIG. 1 

under test to minimize the chance that relevant information works well> it has some disadvantagcs . As discussed above, 

will go undetected. tnere j s a risk mne rent with the polling scheme implemented 

To address the foregoing concerns, multiple polling 5S in FIG. 1 in that any file updated multiple times during one 

machines have been employed. As shown in FIG. 1, each of polling cycle results in some data be missed by the monitor 

the polhng machines PM1-PMX is connected to each of the system 5 and lost forever. Furthermore, fault tolerance and 

systems 1, 3 under test. Each polling machine is responsible reliability are a concern. In particular, the modem connec- 

for sampling only a subset of the information to be gathered. tions that implement the electronic links between the central 

In one implementation, three polling machines have been 60 monitor system 5 and the systems 1, 3 under test are not as 

employed, each sampling different files in the systems under reliable as would be desired. In addition, as discussed above, 

test. a single point of failure may be encountered if there is a 

When relevant information from any of the systems 1, 3 single polling machine that is solely responsible for moni- 

under test is updated, the information is transferred, via a toring any of the test data within one of the systems under 

modem/telephone line connection 11-14, from the test sys- 65 test. 

tem to the relevant polhng machine. As discussed above, In view of the foregoing, it is an object of the present 

most computer or data processing systems employ a service invention to provide an improved method and apparatus for 
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monitoring the manufacture/test of computer or data pro- responsive to an update of the at least one file that represents 

cessing systems. the status of the one of the plurality of data processing 

Furthermore, as discussed above, although the call home systems, for transmitting information reflecting the update to 

technique for monitoring systems in the field works well, it at least one file over network cloud to te monitoring 

suffers from some reliability disadvantages due to the fact 5 system. 

that a system experiencing failures must place a call, over a Jt Anot f er illustrative embodiment of the invention is 
modem/telephone line connection, to the customer service directed to a method of monitoring an mventory of parts in 
•* • t _*u u- c *u * * ♦* * a plurality of data processing systems. The method corn- 
center. Thus, it is a further object of the present invention to f . . , r /a\ j- i. , ■ ,« ■ j * *ii 

.j . , l j j ^ -j* prises the steps ot: (A) providing each part with identitica- 

provide an unproved method and apparatus for providing infonna ^ n ^ readable by one of the plurality of 

customer service to computer or data processing systems in 10 ^ ptocessing systems whefl ^ part ^ ^ ^ one 

tne field. 0 £ ^ e pi ura [i lv of data processing systems; and (B) using the 

In addition, it is an object of the present invention to identification information for the plurality of parts in each 

provide an improved method and apparatus for controlling one of the data processing systems to automatically generate 

inventory in the manufacture of a computer or data process- an inventory of parts for each of the plurality of data 

ing system. 15 processing systems. 

^t,,^ ,,n mvrnr^i A further illustrative embodiment of the invention is 

SUMMARY OF THE INVENTION difectcd to an apparatuSj comprising a plurality of data 

One illustrative embodiment of the invention is directed processing systems; and a service center that is coupled to 

to a method of monitoring a plurality of data processing the plurality of data processing systems and provides at least 

systems from a monitoring system to determine when any of 20 one service to the plurality of data processing systems. Each 

the data processing systems experiences a failure. The one of the plurality of data processing systems includes 

method comprises the steps of: (A) coupling the plurality of ret l uest means for transmitting a service request to the 

data processing systems to the monitoring system via a s u emce c f l ? requesting a check of whether a resource in 

4 i i j j /t>\ u f ,u i v* fj* the one of the plurality of data processing systems is up to 

network cloud; and (B) when one ot the plurality ot data j * t*. ■ / - i j -„ l 

w . - M .. ' c 25 date. The service center includes means, responsive to each 

processing systems experiences a failure, detecting the fail- for transmittin informat T on back to a 

ure at the momtonng system based upon commumcations ^5^^ p rocess ing system indicating whether the 

over the network cloud between the one of the plurality of resource in ^ tequesting data p rocessill g system is up to 

data processing systems and the monitoring system. date 

Another illustrative embodiment of the invention is 3Q Another illustrative embodiment of the invention is 

directed to an apparatus comprising a network cloud; a directed to a method of automatically downloading an 

plurality of data processing systems coupled to the network updated a piece of software to a plurality of data processing 

cloud; and a monitoring system, coupled to the network systems, the plurality of data processing systems each being 

cloud, that monitors the plurality of data processing systems to a service center. The method comprises the steps 

to determine when any of the data processing systems 3 ' 5 of . ( A ) p rov iding the updated piece of software on the 

experiences a failure, wherein the monitoring system detects service ( B ) periodically receiving service requests 

a failure in one of the data processing systems based upon from eacn of tne pmra ii ly 0 f data processing systems, each 

communications over the network cloud between the one of service request inc i ud ing information from which a deter- 

the plurality of data processing systems and the monitoring m ination can be made as to whether the data processing 

system. ^ S ystem that transmitted the request has a copy of the updated 

A further illustrative embodiment of the invention is piece of software; (C) in response to the service requests, 

directed to an apparatus comprising a network cloud; and a automatically determining which of the plurality of data 

data processing system having a service processor directly processing systems do not have a copy of the updated piece 

coupled to the network cloud. 0 f software by; and (D) automatically downloading a copy 

Another illustrative embodiment of the invention is 45 of the updated piece of software to the data processing 

directed to a method of monitoring the status of a plurality systems that do not have a copy of the updated piece of 

of data processing systems from a monitoring system, the software. 

plurality of data processing systems being coupled to the A further illustrative embodiment of the invention is 

monitoring system via a network cloud. The method com- directed to a method of using a monitoring system to 

prises the steps of: (A) when the status of one of the plurality 50 monitor the status of a plurality of data processing systems 

of data processing systems is updated, modifying a file in a manufacture/test facility. The method comprises the 

within the one of the plurality of data processing systems to steps of: (A) executing a plurality of tests on each of the 

reflect the updated status; and (B) in response to the modi- plurality of data processing systems to test the functional 

fication of the file in the one of the data processing systems, operation of the plurality of data processing systems, each 

transmitting information reflecting the updated status of the 55 one of the plurality of tests generating a failure when one of 

one of the plurality of data processing systems over the the plurality of data processing systems does not properly 

network cloud from the one of the data processing systems execute the one of the plurality of tests; (B) when a failing 

to the monitoring system. one of the plurality of data processing systems experiences 

A further illustrative embodiment of the invention is a failure, storing information in the failing one of the 

directed to an apparatus comprising a network cloud; a 60 plurality of data processing systems identifying a nature of 

plurality of data processing systems coupled to the network the failure, and broadcasting a service request from the 

cloud, each one of the plurality of data processing systems failing one of the plurality of data processing systems to the 

having at least one file that represents a status of the one of monitoring system, the service request indicating that the 

the plurality of data processing systems; and a monitoring failure has occurred; and (C) storing information in the 

system, coupled to the network cloud, that monitors the 65 monitoring system to record the failure in response to 

status of the plurality of data processing systems. Each one information provided by the failing one of the plurality of 

of the plurality of data processing systems includes means, data processing systems. 
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Another illustrative embodiment of the invention is munication between the APC monitor 25 and each of the 

directed to a method of using a monitoring system to systems under test 21-23 is facilitated by assigning each of 

monitor the status of a plurality of data processing systems the test systems a particular name or address recognized by 

in a manufacture/test facility. The method comprises steps the network 27. As discussed more fully below, the moni- 

of: (A) executing a plurality of tests on each of the plurality 5 Coring system shown in FIG. 2 can be used to not only 

of data processing systems to test the functional operation of monitor systems during the manufacture/test process, but 

the plurality of data processing systems, each one of the can ^° be uscd to monitor systems in the field. The APC 

plurality of tests generating a failure when one of the monitor 25 deludes a database 35 to store information 

plurality of data processing systems does not properly concerning the systems being monitored, 

execute the one of the plurality of tests; (B) periodically 10 In one embodiment of the invention, the database used 

transmitting inquiries from the monitoring system to each of durin g mc manufacture/test monitoring process is also used 

the plurality of data processing systems requesting informa- to mana 6 e the inventory of the parts and subcomponents 

tion as to whether the one of the plurality of data processing (collectively "parts") used in the systems under test. The 

systems has experienced a failure; and (C) when a failing database is automatically updated to maintain accurate infor- 

one of the plurality of data processing systems experiences 15 matl0D regarding the parts in each system under test, 

a failure, storing information in the failing one of the Although the loss of some test data due to the polling delay 

plurality of data processing systems identifying a nature of k the P nor art P roccdure of &° 1 migbl be tolerated, the 

the failure, and responding to one of the periodic inquiries same » not whcn the database is ako employed for 

by transmitting the stored information that indicates the inventory management and control. For example, m accor- 

nature of the failure to the monitoring system. 20 dance wth one embodiment of the invention, if a technician 

were to swap out a power supply in test system 21 and 

BRIEF DESCRIPTION OF THE DRAWINGS replace it with another part, the monitoring system of FIG. 

2 will detect this change. The swapping of parts is reflected 

FIG. 1 is a block diagram of a prior art monitoring in an update to a file on test system 21. If this transaction was 

procedure that employs polling techniques to monitor a not detected (e.g., due to delay in the polling time of the 

number of systems under test; pr jor art monitor system of FIG. 1) the event would go 

FIG. 2 is a block diagram of one embodiment of the unrecorded in the database, which would then be inaccurate 

invention that connects a monitoring system to a number of for inventory control purposes. 

systems, either under test or in the field, via a network; To address the concern regarding inventory transactions 

FIG. 3 illustrates one exemplary implementation of a 30 or test file updates being missed as a result of the polling 

plurality of servers employed in the monitoring system of loop time in the prior art arrangement of FIG. 1, the 

FIG. 2; embodiment of the invention shown in FIG. 2 employs a 

FIG. 4 illustrates the embodiment of the invention shown transaction-based procedure. In particular, each of the sys- 

in FIG. 3, wherein the network is subdivided into a plurality tems 2l > 23 bcin S monitored detects situations wherein 

of subnetworks and the monitoring system is subdivided into 35 information should be updated in the APC monitor 25, and 

a plurality of service areas; notifies the monitor 25. This is similar to the call home 

_ . , j. jj . ... feature discussed above, except that the notification is trans- 

FIG. 5 is a block diagram of a service processor in the . 4 , , r . , h « , 

j . j t nutted over the network 27, rather than a modem/telephone 

computer systems monitored m accordance with the present r , r , . - I , 

invention* connection. Each or the monitored systems 21, 23 has 

' 40 an associated agent 29, 31. Each agent 29, 31 monitors the 

FIGS. 6a, bb and 6c are a flow chart of a program that rc levant files of its associated system 21, 23, and when any 

implements an agent on the systems being monitored in of fa QSC files ^ up d a ted, the agent performs two functions, 

accordance with the embodiments of the invention shown in First ^ mc agcnt broadcasts a service request to the APC 

FIGS. 2-5; and monitor 25 over the network 27, indicating that there has 

FIG. 7 illustrates a system architecture in accordance with 45 been a change of a relevant file that the APC monitor 25 
another illustrative embodiment of the invention wherein the should be aware of. Second, the agent stores or queues the 
service processor of a data processing system is directly updated information so that as the monitored system con- 
coupled to a network. tinues to operate, the queued information will not be lost if 

the relevant file is updated again, and will be available to the 

DETAILED DESCRIPTION OF THE 5Q monitor 25 when it services the request. The queuing 

INVENTION 0 f the information by the agent ensures that no relevant 

The disadvantages discussed above in connection with the information will be lost, even if there is a delay (e.g., due to 

prior art test monitoring procedure of FIG. 1 are addressed the network 27 going down) in the APC monitor 25 servic- 

in one embodiment of the present invention shown in FIG. ing the broadcast request. The transaction based procedure is 

2. In this embodiment of the invention, each of the systems 55 ^0 advantageous in that it results in real time updates of the 

21-23 under test is connected, via a network 27, to a central information in the APC monitor 25. 

monitoring system 25 labeled in FIG. 2 as an APC (adaptive The APC monitor 25 includes at least one server 33 that 

process control) monitor. The network 27 can be of any type, is responsible for servicing the requests broadcast by the 

and can include an intranet in a campus type environment, agents 29, 31 over the network 27. In a manner that is 

or can include the Internet. Thus, the systems under test 21, 60 discussed in more detail below, the servers 33 handle the 

23 share the network connection to the APC monitor, such broadcast requests by reading the relevant information from 

that the network connection is not used exclusively for the requesting agent 29, 31 over the network 27, and then 

communication between the APC monitor and any one of updating the database 35 with the new information provided 

the systems under test. By employing a network 27 to by the agent. 

electronically couple the systems under test 21-23 to the 65 Relying upon the systems 21, 23 being monitored to 

APC monitor 25, the unreliability inherent in the use of notify (via their agents) the APC monitor 25 of status 

modem and telephone line connections is overcome. Com- updates either during the test process or in the field presents 
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a problem similar to that discussed above in connection with drives, another type of service can include a disc service 39 

the "call home" procedure, i.e., if any of the systems 21, 23 that includes information relating to the disc drives in the 

experiences a problem significant enough to prevent the monitored system. Other examples include a test service 41 

agent 29, 31 from making a broadcast over the network 27, that includes information relating to factors such as envi- 

information relating to that system could be lost. To address 5 ronmental testing, and a hardware service 43 that includes 

this concern, one embodiment of the present invention information relating to the other hardware in the systems 

includes a special type of server 37 labeled as a heart beat being monitored. 

server. The purpose of the heart beat server 37 is to poll each ]t should be appreciated that each service need only be 

of the systems being monitored to ensure that each of the implemented on a single one of the servers 33A-33N. 

agents 29, 31 is continuing to operate. In this respect, the 1Q However, in one embodiment of the present invention, each 

heart beat server 37 can simply ping each of the monitored service is implemented on at least two of the servers. For 

systems in any of a number of manners to ensure that the example, the disc service 39 shown in FIG. 3 is implemented 

system is still capable of communicating over the network on both server 33 A and server 33N. When each service is 

27. implemented on at least two servers, the fault tolerance of 

It should be appreciated from the foregoing that the 15 the system is improved because there is no single point of 

embodiment of the invention shown in FIG. 2 employs failure that would cause the APC monitor 25 (FIG. 2) to stop 

bidirectional communication between the systems 21, 23 collecting information from the systems under test. In 

being monitored and the APC monitor 25 to ensure that addition, implementing the same service on multiple servers 

relevant information is not lost. Communication originating enables multiple servers to respond to the same type of 

from the monitored systems is transaction based, in that each ^ service request. This is advantageous because multiple 

transaction generates a service request, ensuring that no agents 29, 31 associated with the monitored systems are 

transactions (e.g., a swap of components for inventory capable of sending the same type of service request simul- 

tracking or an update of test data) will be missed. The taneously. Having multiple servers available to respond to 

queuing feature ensures that information is not lost even if the same type of request enables multiple service requests of 

there is a delay in the APC monitor 25 servicing the request. ^ the same type to be processed simultaneously. It should be 

Finally, the heart beat server 37 ensures that if any of the appreciated that there is tremendous flexibility in the manner 

monitored systems 21, 23 experiences a problem rendering in which the services can be distributed amongst the servers 

it incapable of sending a broadcast over the network 27, this 33A-33N, such that any number of services can be imple- 

problem is immediately recognized so that action can be mented on any of the servers. 

taken to address the problem. 30 In the embodiment of the present invention wherein 

In accordance with the embodiment of the invention multiple ones of the servers 33A-33N implement the same 

relating to inventory control, when each part is fabricated, it service, a protocol is employed to determine which of those 

is provided with a part number and a serial number that are servers will handle each service request. As discussed in 

stored in a storage element (e.g., a memory chip) on the part. more detail below, in one embodiment of the invention, each 

If a part is added, removed or replaced from a system under 35 of the servers responds to a service request with some 

test or in the field, a file in the monitored system is updated information concerning the server, and the agent that broad- 

to reflect the change. For example, for the illustrative casted the request uses that information to select which 

example described above relating to a storage subsystem, server will handle the request. It should be appreciated that 

there can be a file that includes the part type and serial the present invention is not limited in this respect, and that 

numbers for every disc drive in the system. The updating of 40 numerous other protocols can be employed, 

the file on the monitored system 21, 23 triggers the broad- A large manufacture/test operation may include multiple 

casting of a service request to the APC monitor 25, which facilities, some located great distances from each other and 

then updates its database 35 to reflect the change for perhaps in different countries. The present invention is 

inventory tracking purposes. flexible enough to be employed within a single manufacture/ 

The servers 33 can be implemented by a single device that 45 test facility, and to also network together different facilities, 

includes each of the services necessary to service broadcast even if they are separated by great distances. Similarly, when 

requests from the agents 29, 31, or multiple servers can be monitoring systems in the field, the present invention 

employed. Each server can be implemented in any of a enables a plurality of different customer service centers to be 

number of ways. For example, each server can be imple- networked together. FIG. 4 illustrates one implementation of 

mented via a PC capable of communicating over network 50 a system employing the aspects of the present invention 

27, discussed above in connection with FIG. 3 to link together 

In one embodiment of the invention shown in FIG. 3, two facilities which may be disposed at remote locations. It 

multiple servers are provided for fault tolerance reasons. In should be appreciated that similar techniques can be 

FIG. 3, a plurality of servers 33A-33N is provided. Each of employed to network together different areas in a single 

the servers includes at least one service. The services that 55 facility. 

respond to service requests broadcast over the network 27 by Networks typically include a plurality of subnetworks that 
the agents 29, 31 (FIG. 2) can each be implemented simply each interconnects a plurality of devices. Such an arrange- 
as a program, run on a PC or other device that implements ment is shown in FIG, 4, wherein the network 27 is shown 
the server, that is idle and awaits an appropriate broadcast as split into a plurality of subnetworks 27A-27N. In the 
request to initiate the program. There are many different 60 embodiment shown in FIG. 4, the APC monitor 25 is 
types of services that can be implemented, with the specific subdivided into a plurality of service areas 25A-25N, each 
types of services employed being dependent upon the nature corresponding to one of the subnetworks 27A-27N. Each of 
of the systems 21, 23 being monitored. One example of a the service areas 25A-25N includes one or more servers 33 
service is the heart beat service 37 discussed above in and a database 35 that operate together in the manner 
connection with FIG. 2. In addition, for the illustrative 65 described above to service requests from the agents 29, 31 
embodiment described above wherein the monitored sys- associated with those monitored systems 21, 23 that are 
terns are storage subsystems that include a plurality of disc coupled to the corresponding subnetwork. The databases 35 



11/25/2003, EAST Version: 1.4.1 



6,138, 

11 

within each of the plurality of service areas can be consoli- 
dated via an information warehouse 51, so that all of the 
information stored within the multiple service areas is 
accessible from a central location. The information ware- 
house can be implemented in any manner, and the present 5 
invention is not limited to any particular implementation. As 
will be appreciated by those skilled in the art, an information 
warehouse is typically implemented by replicating the infor- 
mation stored in each of the databases 35 in a single storage 
system to form the information warehouse. 1Q 

The embodiment of the invention shown in FIG. 4 can be 
implemented in a number of different ways. For example, 
the service requests for each of the agents 29, 31 can be 
serviced solely by the servers 33 in the one of the service 
areas 25A-25N that is coupled to the same subnetwork. In 
this respect, if a request is not serviced within a predeter- 15 
mined period of time, the requesting agent can simply 
retransmit the request over the subnetwork. Alternatively, 
the system can be implemented so that although initial 
preference is given to servicing all requests locally within its 
particular subnetwork, when a service request goes 20 
unanswered, the requesting agent can rebroadcast the 
request globally over network 27 to other service areas 
25A-25N. As discussed above, the different service areas 
can be located within one facility, or can be remotely located 
at great distances from each other. 25 

One advantageous feature of the embodiments of the 
present invention shown in FIGS. 2-4 is that they require 
very little support in terms of changes to the systems 21, 23 
being monitored. In particular, most computer or data pro- 
cessing systems employ some processing hardware that can 30 
be used to implement the agent 29, 31, so that no hardware 
support need be added to the data processing system. As 
discussed above, in the illustrative embodiment wherein the 
systems being monitored are storage subsystems, these 
systems typically include a service processor implemented 35 
by a PC. A block diagram of such a service processor is 
shown in FIG. 5. The service processor 53 includes a 
processor 55 on which software programs can be executed 
and a memory 57 in which the software can be stored. The 
agent can be implemented in software that is stored in the 40 
memory 57 and executed on the processor 55. As shown in 
FIG. 5, the service processor 53 conventionally includes a 
network interface 54 (e.g., an Ethernet port) that enables the 
processor 55 to communicate with a network cloud, as well 
as a non-network interface 56 (e.g., a SCSI port) that enables 45 
the processor 55 to communicate with the data processing 
system. For example, the non-network interface may be 
coupled to a backplane in the data processing system. As 
used herein, the term network interface is used to define an 
interface capable of communicating with a network cloud 50 
over which communication is performed using a network 
protocol, so that information transferred through the cloud 
includes a destination address to enable the information to 
be transported by the cloud to the appropriate destination. 

It should be appreciated that although the service proces- 55 
sor provides a convenient platform on which to implement 
the agent, the present invention is not limited in this respect, 
because the agent can be implemented in other ways. For 
example, the data processing system may include other 
processing hardware that has access to a network interface 60 
and can serve as the agent. If the system 21, 23 (FIG. 3) 
being monitored is a PC or other general purpose computer, 
the agent can simply be implemented in software that 
executes on the general purpose computer. Alternatively, the 
agent can be implemented in the special purpose hardware 65 
implemented either on the system being monitored, or in an 
associated device that is coupled thereto. 
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An illustrative flowchart for a software program to imple- 
ment the agent 29, 31 is shown in FIG. 6. It should be 
appreciated that the present invention is not limited to this 
particular implementation, as numerous others are possible. 

In the illustrative embodiment of the invention shown in 
FIG. 6, the system is provided with the advantageous 
capability of enabling the particular files monitored by the 
agent 29, 31 in each system to be updated or controlled by 
the APC monitor 25, The agent can be initialized with a 
default set of files to monitor, but this list can be altered by 
the APC monitor 25. Thus, modifications to the monitored 
systems, either in the manufacture/test environment or in the 
field, can be made automatically by simply updating the 
database 35 in the APC monitor, without requiring manual 
updates to each system. Although this automatic reconfigu- 
ration provides the advantages described above, it should be 
appreciated that the invention is not limited in this respect, 
and that the agents can simply monitor files according to a 
list initially stored therein, without checking for updates 
from the APC monitor. 

Upon powering up of the processor on which the agent is 
implemented, the agent program begins. Initially, in step 61, 
the agent broadcasts a service request to determine whether 
any updates should be made to its list of files to be 
monitored. A service is implemented on at least one of the 
servers 33A-33N (FIG. 3) to handle this service request in 
the manner discussed above. In step 63, the program checks 
to determine whether a response from one of the servers is 
received before a time out condition occurs. If no response 
is received within this time period, the program proceeds to 
step 65, wherein a determination is made as to whether 
requests from the agent are to remain within a particular 
subnetwork, or should be broadcast more globally. If it is 
determined at step 65 that more global broadcasts over the 
network are to be made, the program proceeds to step 67, 
wherein the request is rebroadcast globally. Thereafter, the 
method proceeds to step 69, wherein a determination is 
made as to whether a response has been received before a 
time out condition has occurred. 

When it is determined at step 65 that the program is to 
limit its broadcasts to the local subnetwork, or when it is 
determined at step 69 that no response was received to the 
global broadcast before a time out occurred, the program 
proceeds to step 71, wherein a determination is made as to 
whether the broadcast requesting updates to the list of 
monitored files should be retried. In this respect, the agent 
can be initialized to re-try the broadcast a number of times 
before simply proceeding with the list of monitored files 
with which it was initialized. When it is determined that the 
broadcast should be retried, the program returns to step 61. 

When it is determined at either of steps 63 or 69 that a 
response to the broadcast was received, or when it is 
determined at step 71 that the broadcast should no longer be 
retried, the program proceeds to step 73, wherein a deter- 
mination is made as to whether the initialized list of moni- 
tored files should be updated, and when it should, the 
program proceeds to step 75 to update the list based on the 
information returned by the server that responded to the 
broadcast request. 

When it is determined at step 73 that the list of monitored 
files should not be updated, or after the list is updated at step 
75, the program proceeds to step 77, wherein it begins to 
process the first file on the fist of monitored files. The 
program then proceeds to step 79, wherein a determination 
is made as to whether the file being processed has been 
updated since the last time it was checked. When it has, the 
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method proceeds to step 81, wherein the newly updated system. In particular, each of the servers 33 A-33N (FIG. 3) 

information is stored. As discussed above, this is advanta- that responds to a service request can calculate a cost 

geous because it ensures that no data will be lost in the event associated with responding to the request. A number of 

that there is a delay in one of the servers responding to the factors can be considered in determining the cost of servic- 

broadcast request. 5 ing a request, including a number of service requests that the 

The program next proceeds to step 83, wherein a deter- server ma y have queued up, available memory in the server, 

mination is made as to whether to arbitrate to select a etc - Each can men respond to the requesting agent 

particular server to handle the service request for the with its cost information, and the agent can select the 

updated file, or whether to simply employ a previously responding server with the lowest cost to handle the request, 

selected server. Although this determination is advantageous 10 After a particular server is selected, the program proceeds 

for the reasons discussed immediately below, it should be to step 99, wherein the updated information associated with 

understood that the invention is not limited in this respect, the service request is sent to the selected server from the 

and that the agent could simply arbitrate for a server each queue in which it was stored in step 81. The information sent 

time a request is broadcast. to the server can include the full updated file that was 

As discussed above, in one embodiment of the present 15 monitored by the agent, or some subset of information 

invention, multiple servers 33A-33N (FIG. 3) can be pro- indicating what portion of the file has been changed, 

vided with the capability of responding to a particular type When it is determined at step 91 that requests from the 

of service request. Thus, when an agent broadcasts a service agent should remain in the local subnetwork, or when no 

request over the network 27 (FIG. 2), multiple servers may response is received to the global rebroadcast in step 95, the 

respond indicating that they have the capability of servicing 20 program proceeds to step 101, wherein the service request is 

the request. The agent then selects between the responding added to a list to be retried. 

servers to determine which will handle the request. This After the data is sent to a service provider in either of steps 

process is advantageous for fault tolerance reasons because 85 and 99, or after the request is added to the retry list in step 

multiple servers can handle each request. Furthermore, the 101, the program proceeds to step 103, to determine whether 

agent can select the particular server that can handle the one 0 f the broadcast requests on the retry list should be 

request most efficiently. However, it should be appreciated rebroadcast. In this respect, as stated above, when a problem 

that this arbitration process requires some resources in terms encountered in getting a server to respond to a service 

of processing time and traffic over the network 27. Thus, in request, there is no danger that the data will be lost, because 

one embodiment of the invention, once an agent arbitrates to ft j s stored in step 81. The illustrative program shown in 

select a particular server to handle a specific type of service FIGS. 6a-b will occasionally rebroadcast such requests, 

request, the agent will automatically send all future requests However, the program will not loop indefinitely through 

for the same type of service to that server for a predeter- s t e ps 83-103 attempting to retry any service request for 

mined period of time, rather than arbitrating separately for which a response was not received. In this respect, the 

numerous service requests of the same type. ^ program places a priority on continually looping through the 

Consistent with the foregoing, at step 83, a determination list of files to be monitored, to ensure that no updates are 

is made as to whether to arbitrate for a new server. When it missed. Therefore, a decision will be made in step 103 only 

is determined that such arbitration should not take place, the periodically to attempt to rebroadcast some of the service 

program proceeds to step 85, wherein the relevant data for requests on the retry list. This determination can be based on 

the service request is simply sent to the previously selected ^ the number of monitored files processed between service 

service provider. request retries, on a particular time period, or some other 

When it is determined at step 83 that the agent should suitable criteria, 

arbitrate for a new service provider, the program proceeds to When it is determined at step 103 that one of the service 

step 87, wherein the service request is broadcast over the requests on the retry list should be rebroadcast, the program 

appropriate subnetwork (FIG. 4). At step 89, the program 45 proceeds to step 105 to go to the next service request on the 

determines whether a response is received before a time out retry list, and then returns to step 83. Alternatively, when it 

condition, and when one is not, the program proceeds to step is determined at step 103 that the program should not 

91, wherein a determination is made as to whether the rebroadcast one of the service commands on the retry list, 

service request should remain within the local subnetwork. the program proceeds to step 107, wherein it goes to the next 

When it is determined that the request should not remain 50 file on the monitor list, and then returns to step 79. 

within the local subnetwork, the program proceeds to step discussed above, the embodiments of the present 

93, wherein the service request is rebroadcast globally, and invention discussed above are not limited to a manufacture/ 

the program then proceeds to step 95 to determine whether test environment, and can be used in connection with 

a response is received before a time out condition. When it systems installed in the field. The APC monitor 25 (FIG. 2), 

is determined at either of steps 89 or 95 that a response has 55 or its service areas 25A-25N (FIG. 4), can be customer 

been received from at least one of the servers before the time service centers located anywhere in the world, with access 

out condition, the program proceeds to step 97, wherein a to remote field installations over a network 27 that can be 

service provider is selected from those that responded. implemented, for example, as a portion of an intranet or over 

The service provider can be selected in step 97 in any of the Internet. The customer service center can employ a heart 

a number of ways, using either a very simple selection 60 beat service as discussed above to ensure that each of the 

algorithm or a more complex one. For example, each server systems in the field is functioning properly. Furthermore, 

can be assigned a number that it can return in the message each machine in the field can broadcast service requests 

sent to the requesting agent, and the agent can simply select when it experiences a problem that should be brought to the 

the highest or lowest number to select a particular service attention of customer service. It should be appreciated that 

provider. However, in accordance with one embodiment of 65 this embodiment of the present invention has a number of 

the invention, a more complex selection algorithm is advantages over the call home technique discussed above, 

employed in an attempt to increase the efficiency of the First, the use of a modem-less connection (e.g., network 27) 
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to connect the systems in the field with the customer service service center that operates in much the same manner as the 

center increases the reliability of the monitoring system. central monitor system 5 that was discussed above in 

Second, the heartbeat service ensures that if a system in the connection with FIG. 1, and has previously been employed 

field experiences a problem severe enough to inhibit it from only in a manufacture/test facility. This polling technique 

sending a service request to the customer service center, this 5 provides an advantage over the call home technique dis- 

problem will be detected quickly, likely before the customer cussed above for use in the field, in that the customer service 

is aware of the problem. center controls communication with the systems in the field, 

It should be appreciated that with both customer instal- so that if a problem is encountered that would prevent a 

lations in the field and the manufacture/test environment, system from calling home, such a problem will be detected, 

systems may periodically be added to or removed from the 10 This polling technique can be employed with a modem/ 

monitoring system of the present invention. To facilitate this telephone line connection between the systems in the field 

process, in one embodiment of the invention, each agent has and the customer service center, or these systems can be 

the capability of sending two types of service requests, a first connected via a network in the manner described above to 

requesting registration so that the corresponding system is increase reliability. 

added to the monitoring system, and a second requesting that 15 In another embodiment of the invention, the polling 

the system be removed from the monitoring system. The techniques of the prior art system of FIG. 1 are combined 

servers 33A-33N (FIG. 3) can include a service to handle with the queuing of information by the systems being 

these registration requests, so that changes to the list of monitored to ensure that no data is lost as a result of the loop 

registered systems can be reflected in the database and the time inherent in a polling system. This aspect of the present 

list of monitored systems for the heart beat service. ^ invention can be used to monitor systems in the field, or in 

As discussed above, in one embodiment of the present a manufacture/test environment. This embodiment of the 

invention, when a relevant data file is updated in a system invention can also be employed with a modem/telephone 

21, 23 (FIG. 2) being monitored, this information is stored line connection between the monitored systems and the 

by the agent 29, 31 (FIG. 2) so that it is not lost. In the central monitor, or these connections can be accomplished 

embodiment of the invention wherein the agent is imple- 25 y i a a network in the manner described above, 

men ted in the service processor of the monitored system, As discussed above, one advantageous feature of the 

this information can simply be stored in the service proces- embodiments of the present invention that generate a service 

sor. For example, when the service processor is implemented request when a file is updated is that the monitoring system 

as a PC, the information can be stored in the hard drive 25 and its database can also be employed for inventory 

associated with the PC. However, the invention is not 30 control. As discussed above, in accordance with one 

limited in this respect, as the queued information can be embodiment of the invention, each component and subas- 

stored in other locations, including on a dedicated storage sembly is provided with a part number and a serial number 

device coupled to the agent. In one embodiment of the that can be read via the agent associated with the monitored 

invention, the amount of storage provided to queue updated system 21, 23 (FIG. 2). Thus, when a component or subas- 

information awaiting service from the APC monitor 25 is 35 sembly is added or removed from one of the systems being 

configurable. If the amount of information to be stored monitored, an internal file is updated, resulting in a service 

exceeds the configured amount, one embodiment of the request that causes the new component/sub assembly infor- 

invention continues to write new information to the storage mation to be loaded into the database 35. It should be 

area, so that the oldest data will be lost first. However, when appreciated that this embodiment of the invention thereby 

used in conjunction with the heart beat service discussed 40 enables inventory to be tracked automatically, such that the 

above, it is believed that many problems that would result in database 35 (FIG. 2) will store correct information regarding 

a failure of the updated information to be transferred from the inventory used in the monitored systems 21, 23, without 

the monitored system to the APC monitor's database 35 requiring any manual updates to the database when compo- 

(FIG. 2) should be detected quickly, so that significant nents or subassemblies are added to or removed from one of 

amounts of data should not be lost. 45 the monitored systems. It should be appreciated that this 

Although the use of a network to couple together the APC inventory tracking is beneficial not only in a manufacture/ 

monitor system 25 and the systems 21, 23 being monitored test environment, but also for monitoring systems in the 

provides a number of advantages discussed above, it should field. In particular, customers may from time to time trade in 

be appreciated that the present invention is not limited in this equipment when purchasing new systems. Thus, for inven- 

respect, and that the communication protocol discussed 50 tory control purposes, it may also be useful to have the 

above in connection with FIGS. 2-6 can alternatively be database 35 include information concerning all of the com- 

employed with different types of communication links being ponents and subassemblies in the systems in the field, 

established being the APC monitor system 25 and the In the discussion above, one example provided of a 

monitored systems. For example, modem and telephone line computer or data processing system that can be monitored in 

connections can be employed. Each time a service request is 55 accordance with the embodiments of the present invention is 

broadcast, the system being monitored can dial a designated a storage subsystem. However, as discussed above, it should 

telephone number for a server that can respond to the be appreciated that the present invention is not limited in this 

particular type of request. Similarly, the heartbeat service respect, and that the present invention can be employed to 

can dial into each of the systems being monitored to ensure monitor computer or data processing systems of numerous 

that each is operating properly. eo other types, including general purpose computers and other 

The aspects of the present invention directed to the systems that have some processing capabilities. In addition, 

monitoring of systems in the field are not limited to the it should be appreciated that the present invention can be 

implementation described above that employs transaction- used to monitor subassemblies of a complete system. For 

based service requests issued from the agent associated with example, during the subassembly test process, the electronic 

each system in the field, and the heart beat service executed 65 boards that control the operation of a system such as a 

on the customer service center. One alternate embodiment of storage subsystem can be tested as a subassembly, and the 

the present invention employs polling by the customer status of the subassembly tests, and parts for inventory 
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control, can be monitored using the embodiments of the whether those revisions are up to date. In response to those 
present invention in the manner described above. service requests, the customer service center 115 can auto- 
As described above, some types of data processing sys- matically download any new revisions of software to the 
terns (e.g., CPUs and PCS) have conventionally been pro- data processing system 113. This downloading of software is 
vided with a network interface that enables them to com- 5 entirely automatic, and does not require any manual inter- 
municate with other devices over a network using a network vention at the customer service center 115 or the data 
protocol. When the above-described aspects of the present processing system 113. Once the new software is loaded into 
invention are used to monitor such systems, those systems a database 116 at the customer service center 115, the 
can be coupled to the network in a conventional fashion revision is automatically downloaded to every data process- 
using their network interfaces. However, other types of data 10 ing system 113 coupled to the customer service center 115 
processing systems have not conventionally been directly via the network, in a manner that is entirely transparent to 
coupled to a network. Many types of data processing sys- *e users of the data processing systems. This elimination of 
terns (e.g., storage subsystems) have only previously been manual intervention is particularly advantageous because 
connected to a network through a non-network interface that u P dates arc often conventionally down by techni- 
is coupled to another computer (e.g., a CPU), that in turn has 15 ^ g ° to thc phyS1Cal SltC ° f Cach data pr0CCSS * 
a network interface capable of communicating using a m g sy s em • _ _ . , 
network protocol. One aspect of the present invention is W"^*?^ C0nne ^f iu W , l "}. S0 /^ are 

directed to a new architecture that dkectly couples such m ^ & f d ' lt sl ™ ld be »PP' e « a ! ed ^ 

, 1 - A . . j , r of the present invention is also advantageous in a 

systems to a network. This architecture is advantageous for manufac r u re/test f acility to update software on all of the data 

a number of applications in addition to the monitoring M prDcessing systems uoder test 

application discussed above. j n a ^|^ on to software updates, in a manufacture/test 
As discussed above, one embodiment of the present environment or in the field, there may be minimum revision 
invention implements the agent in the service processor of levels 0 f hardware that are supported for the data processing 
the system 21, 23 (FIG. 2) being monitored, and couples the systems being monitored. In another embodiment of the 
service processor directly to a network through a network 25 invention, the database in the customer service center 115 
interface provided on the service processor. Thus, the data (or an analogous monitoring center in a manufacture/test 
processing system has a network interface (provided by the environment) determines the revision level of every corn- 
service processor) that is directly coupled to a network. The ponent and subassembly in every system to which it is 
service processors for many types of data processing sys- connected via the network 27. This can be done using the 
tems have not conventionally been directly coupled to a 39 part and serial numbers as described above. When a change 
network. Rather, as mentioned above, many data processing is made in the minimum revision level needed for any 
systems that employ a service processor have only conven- component or subassembly, the customer service center 115 
tionally been coupled to a network through another com- sends a message to each data processing system 113 that 
puter. In accordance with one embodiment of the present does not meet that revision level, notifying the data pro- 
invention, a different configuration is employed, as shown in 35 cessing system that the particular hardware should be 
FIG. 7. In this configuration, the data processing system 113 updated. This information can then be relayed to a system 
has a network interface 111 that is directly connected to a administrator who can oversee the hardware update. Again, 
network 27. The network interface 111 may be provided by this notification procedure is automatic in that it requires no 
the service processor as described above, or a separate manual intervention by a system operator, 
network interface can be provided that bypasses the service 40 It should be appreciated that the embodiments of the 
processor. As shown in FIG. 7, the data processing system present invention discussed above relating to the monitoring 
113 may also optionally have a conventional connection to and automatic updating of systems in the field or a 
the network 27 through a non-network interface 119 that is manufacture/test environment are not limited to the archi- 
coupled to the network via a CPU 117. tecture shown in FIG. 7. Although specifically described in 
Hie coupling of the data processing system 113 directly to 45 connection with FIG. 7, it should be appreciated that the 
the network 27 via its network interface 111 is a powerful automatic updating feature of the present invention can be 
new configuration, that enables the implementation of a implemented using any of the configurations discussed 
number of useful applications in addition to the monitoring above in connection with FIGS. 2-6. When the data pro- 
application described above. cessing systems being monitored/updated are convention- 
In another embodiment of the invention, one such appli- 50 ally provided with a network interface can be coupled to a 
cation involves communication between the APC monitor network (e.g., if the systems being monitored are PCS or 
25 (FIG. 2) and the plurality of monitored systems 21, 23 in other general purpose computers), that connection can be 
a manufacture/test environment or in the field, that enables used to communicate with the APC monitor 25. 
automatic updates to the monitored systems. For example, Alternatively, for applications involving data processing 
the connection of the network interface 111 of a data 55 systems that are not conventionally coupled directly to a 
processing system 113 in the field to a customer service network, such a connection can be provided in the manner 
facility 115 as shown in FIG. 7 enables software updates to discussed above. 

be made to the data processing system automatically. Although the use of a networked connection is advanta- 

Although it is conventional to download software over a geous for the reasons discussed above, it should be appre- 

network such as the Internet, such downloads have not been 60 dated that the embodiments of the present invention relating 

done automatically, and have required manual intervention to the automatic software and hardware updates are not 

by a system operator. In accordance with one embodiment of limited in this respect. The passing of information between 

the present invention, the agent for the data processing the APC monitor 25 and the monitored systems can alter- 

system 113 can automatically and periodically send to the natively be accomplished in other ways, e.g., via a modem/ 

customer service site 115, over the network 27, service 65 telephone line connection. 

requests that provide the revision numbers of certain soft- It should be appreciated that the embodiments of the 

ware loaded on the data processing system, and query present invention described above provide an improvement 
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over conventional 3-tier client server systems. The embodi- 3. The method of claim 2, wherein step (B) includes steps, 

merits of the invention shown, for example, in FIGS. 2-7 performed by the one of the data processing systems, of: 

include components of a traditional 3-tier client server detecting the failure; and 

system, wherein data flows from the client or first tier (e.g broadcasting a service request over the network cloud 

the systems 21, 23 bemg monitored) to a server or second 5 from the one of the plurality of data processing systems 

tier (eg., the monitoring ; system 25), wherein ^processed tQ ^ monitori t the service t indicati 

and then stored in a third tier (e.g., the database 35). A . c * J ' . #*u i i-* 

number of embodiments of the present invention add th f a < * e failure has 0C( f ™ d m one of ih * 

another layer that can be considered as a fourth tier, and . <l data p ™f sing f sten f A tu . , A . 

which is genericaUy described herein as a process manager. 1fl 4 ; The method of claim 1, further mcludmg a step, 

The process manager monitors information that enters the 10 Performed by the monitoring system, of notifying service 

database in the monitor system (e.g., database 35 in FIG. 2), personnel of the failure in the one of the plurality of data 

and reacts to it in a number of ways depending upon the processing systems. 

nature of the information. Several specific examples of the 5 * ^ method of claun l > wherein step (B) includes steps, 

process manager have been described above. For example, performed by the one of the data processing systems, of: 

when the heart beat service 37 (FIG. 3) detects that one of detecting the failure; 

the systems in the field or under test is experiencing a storing information identifying a nature of the failure; and 

problem that prohibits it from communicating over the broadcasting a service request over the network cloud 

network, information can be written to the database 35 (FIG, from the one of the plurality of data processing systems 

3), which in turn causes the process manager to e-mail ^ to the monitoring system, the service request indicating 

and/or page a system administrator to address the problem. that the failure has occurred in the one of the plurality 

Another example of the process manager is the embodiment of data processing systems. 

of the invention discussed above wherein software updates 6. The method of claim 5, wherein step (B) further 

to the database can result in the process manager broadcast- includes steps of: 

ing information to the monitored systems to automatically transmitting, over the network cloud, a response from the 

update the software on those systems. In these situations, the monitoring system to the service request indicating that 

central monitoring system acts more as a client, with the the monitoring system is prepared to process the ser- 

monitored systems acting in a capacity that is generally vice request; and 

viewed as that of a server. The process manager or fourth tier transmitting, from the one of the plurality of data pro- 
provides a closed feedback loop system and bi-directional 3Q cessing systems to the monitoring system, the stored 
communication between the monitoring system and the information indicating the nature of the failure, 
systems being monitored. 7, The method of claim 6, wherein the monitoring system 
Having described several embodiments of the invention in includes a database, and wherein step (B) further includes a 
detail, various modifications and improvements will readily step of storing in the database the information indicating the 
occur to those skilled in the art. Such modifications and 35 nature of the failure. 

improvements are intended to be within the spirit and scope 8. The method of claim 5, wherein step (B) further 

of the invention. Accordingly, the foregoing description is includes a step of: 

by way of example only, and is not intended as limiting. The w h en a response is received from the monitoring system 

invention is limited only as defined by the following claims indicating that the monitoring system is prepared to 

and the equivalents thereto. m process the service request, transmitting from the one 

What is claimed is: 0 f me plurality of data processing systems to the 

1. A method of monitoring a plurality of remote data monitoring system the stored information indicating 
processing systems installed at a plurality of remote cus- me na ture of the failure. 

tomer sites from a local monitoring system disposed at a 9. method of claim 1, wherein each of the plurality of 

local customer service site to determine when any of the 45 data processing systems is a storage system that includes a 

remote data processing systems experiences a failure, the plurality of disc drives. 

method comprising steps of: 10. The method of claim 9, wherein each of the plurality 

(A) coupling the plurality of remote data processing of data processing systems has a service processor, and 
systems to the local monitoring system via a network wherein step (A) includes a step of directly coupling the 
cloud; and 50 service processor of each of the plurality of data processing 

(B) when one of the plurality of remote data processing systems to the network cloud. 

systems experiences a failure, detecting the failure 11. The method of claim 1, wherein the monitoring system 

based upon communications over the network cloud includes a database, and wherein step (B) further includes a 

between the one of the plurality of remote data pro- step of storing in the database information indicating the 

cessing systems at the remote customer site and the 55 nature of the failure. 

local monitoring system at the customer service site. 12. The method of claim 1, wherein step (B) further 

2. The method of claim 1, wherein step (B) includes steps, includes steps, performed by the monitoring system, of 
performed by the monitoring system, of: periodically transmitting inquiries over the network cloud to 

periodically transmitting communications over the net- the one of the plurality of data processing systems request- 
work cloud to the one of the plurality of data processing eo m g information as to whether the one of the plurality of data 
systems to determine whether the one of the plurality of processing systems has experienced a failure, 
data processing systems is capable of returning a 13. The method of claim 12, wherein step (B) further 
responsive communication over the network cloud; and includes steps, performed by the one of the data processing 

when no responsive communication is received from the systems, of: 

one of the plurality of data processing systems, deter- 65 detecting the failure; 

mining that the failure has occurred in the one of the storing information indicating the nature of the failure; 

data processing systems. and 
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responding to the inquiries by transmitting the stored 
information, indicating the nature of the failure, over 
the network cloud to the monitoring system. 

14. The method of claim 1, wherein step (B) further 
includes steps, performed by the one of the data processing 5 
systems, of; 

detecting the failure; 

storing information indicating the nature of the failure; 
and 

responding to periodic communications from the moni- 10 
to ring system by transmitting to the monitoring system 
the stored information indicating the nature of the 
failure. 

15. Hie method of claim 1, wherein each of the plurality 

of data processing systems has a service processor, and 15 
wherein step (A) includes a step of directly coupling the 
service processor of each of the plurality of data processing 
systems to the network cloud. 

16. The method of claim 1, wherein step (A) includes a 
step of coupling the plurality of data processing systems to 20 
the monitoring system via a network cloud that includes the 
Internet. 

17. The method of claim 16, wherein each of the plurality 
of data processing systems is a storage system that includes 

a plurality of disc drives. 25 

18. The method of claim 1, wherein step (B) includes 
steps, performed by the one of the data processing systems, 
of: 

detecting the failure; and 

broadcasting a service request over the network cloud 
from the one of the plurality of data processing systems 
to the monitoring system, the service request indicating 
that the failure has occurred in the one of the plurality 
of data processing systems. ^ 

19. A method of monitoring a plurality of data processing 
systems from a monitoring system to determine when any of 
the data processing systems experiences a failure, wherein 
the plurality of data processing systems and the monitoring 
system each is installed in a manufacture/test facility, the 
method comprising steps of: 

(A) coupling the plurality of data processing systems to 
the monitoring system via a network cloud; 

(B) executing a plurality of tests on each of the plurality 

of data processing systems to test the functional opera- 45 
tion of the plurality of data processing systems, each 
one of the plurality of tests generating a failure when 
one of the plurality of data processing systems does not 
properly execute the oue of the plurality of tests; and 

(C) when one of the plurality of data processing systems 50 
experiences a failure, detecting the failure at the moni- 
toring system based upon communications over the 
network cloud between the one of the plurality of data 
processing systems and the monitoring system. 

20. The method of claim 19, wherein step (B) includes a 55 
step of: 

when the one of the plurality of data processing systems 
experiences a failure, transferring information from the 
one of the plurality of data processing systems to the 
monitoring system through the network cloud, the eo 
information indicating to the monitoring system that 
the one of the plurality of data processing systems has 
experienced the failure. 

21. The method of claim 19, wherein step (B) includes 
steps, performed by the one of the data processing systems, 55 
of: 

detecting the failure; and 
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broadcasting a service request over the network cloud 
from the one of the plurality of data processing systems 
to the monitoring system, the service request indicating 
that the failure has occurred in the one of the plurality 
of data processing systems. 

22. The method of claim 19, wherein each of the plurality 
of data processing systems has a service processor, and 
wherein step (A) includes a step of directly coupling the 
service processor of each of the plurality of data processing 
systems to the network cloud. 

23. The method of claim 19, wherein the monitoring 
system includes a database, and wherein step (B) further 
includes a step of storing in the database information indi- 
cating the nature of the failures of each of the plurality of 
data processing systems in the manufacture/test facility. 

24. The method of claim 19, wherein step (A) includes a 
step of coupling the plurality of data processing systems to 
the monitoring system via a network cloud that includes the 
Internet. 

25. The method of claim 19, wherein step (B) includes 
steps, performed by the one of the data processing systems, 
of: 

detecting the failure; 

storing information identifying a nature of the failure; and 
broadcasting a service request over the network cloud 
from the one of the plurality of data processing systems 
to the monitoring system, the service request indicating 
that the failure has occurred in the one of the plurality 
of data processing systems. 

26. The method of claim 25, wherein step (B) further 
includes steps of: 

transmitting, over the network cloud, a response from the 
monitoring system to the service request indicating that 
the monitoring system is prepared to process the ser- 
vice request; and 

transmitting, from the one of the plurality of data pro- 
cessing systems to the monitoring system, the stored 
information indicating the nature of the failure. 

27. The method of claim 19, wherein step (B) includes 
steps, performed by the monitoring system, of: 

periodically transmitting communications over the net- 
work cloud to the one of the plurality of data processing 
systems to determine whether the one of the plurality of 
data processing systems is capable of returning a 
responsive communication over the network cloud; and 

when no responsive communication is received from the 
one of the plurality of data processing systems, deter- 
mining that the failure has occurred in the one of the 
data processing systems. 

28. The method of claim 27, wherein step (B) includes 
steps, performed by the one of the data processing systems, 
of: 

detecting the failure; and 

broadcasting a service request over the network cloud 
from the one of the plurality of data processing systems 
to the monitoring system, the service request indicating 
that the failure has occurred in the one of the plurality 
of data processing systems. 

29. The method of claim 19, wherein step (B) further 
includes steps, performed by the monitoring system, of 
periodically transmitting inquiries over the network cloud to 
the one of the plurality of data processing systems request- 
ing information as to whether the one of the plurality of data 
processing systems has experienced a failure. 

30. The method of claim 29, wherein step (B) further 
includes steps, performed by the one of the data processing 
systems, of: 
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detecting the failure; means for, when a failure is detected, broadcasting a 

storing information indicating the nature of the failure; service request over the network cloud to the monitor- 

and ing system indicating that the failure has occurred, 

responding to the inquiries by transmitting the stored 40 - Th e apparatus of claim 33, wherein the monitoring 

information, indicating the nature of the failure, over 5 system includes means for notifying service personnel when 

the network cloud to the monitoring system. a failure is detected in one of the plurality of data processing 

31. The method of claim 19, wherein each of the plurality systems. 

of data processing systems is a storage system that includes 41. The apparatus of claim 33, wherein each of the 

a plurality of disc drives. plurality of data processing systems has a service processor 

32. The method of claim 31, wherein each of the plurality 10 directly coupled to the network cloud. 

of data processing systems has a service processor, and 42. The apparatus of claim 41, wherein the monitoring 
wherein step (A) includes a step of directly coupling the system includes a database and means for storing in the 
service processor of each of the plurality of data processing database the information indicating the nature of each fail- 
systems to the network cloud. ure - 

33. An apparatus, comprising: is 43. The apparatus of claim 33, wherein the monitoring 
a network cloud- system includes a database and means for storing in the 
a plurality of remote data processing systems installed at database information indicating the nature of each failure. 

a plurality of remote customer sites and coupled to the 44 ™ e apparatus of c ami 33, wherein the mom tonng 
* 1 i j j system further includes polling means for periodically trans- 
network cloud; and ' . . . *■ 6 . t , f , ftU 

, A , 20 mittmg inquiries over the network cloud to each one of the 

a local monitoring system disposed at a local customer J ^ ^ tems requesling informal i on 

service site and coupled to the network cloud, wherem r . ,. r ~.x/ i im * a 7 ~ 

it f ... . *l i !•* r as to whether the one of the plurality of data processing 

the local monitoring system monitors the plurality of g ^ rienced a failure 

remote data processing systems to determine when any r™ r . c , • A4 v ■ u r 

r , A & J . . . J 45. The apparatus of claim 44, wherem each one of the 

of the remote data processing systems experiences a „ , . . , . , , 

- j < • .1 | i ... . 25 data processmg systems includes: 

failure, and wherein the local monitoring system r , . r-, • ^ f LJ 

detects a failure in one of the remote data processing means for detectin g a failure m thc one of me data 

systems based upon communications over the network processmg systems; 

cloud between the one of the plurality of remote data means for storing information indicating the nature of 

processing systems and the local monitoring system. eaca failure; and 

34. The apparatus of claim 33, wherein each one of the means, responsive to the inquiries, for transmitting the 
data processing systems includes: stored information that indicates the nature of the 

means for detecting a failure in the one of the data failure over the network cloud to the monitoring sys- 

processing systems; and tem. 

means for, when a failure is detected, broadcasting a 35 , 46. The apparatus of claim 33, wherein the network cloud 

service request over the network cloud to the monitor- includes the Internet. 

ing system indicating that the failure has occurred. 47 • The apparatus of claim 46, wherein each of the 

35. The apparatus of claim 34, wherein the monitoring plurality of data processing systems is a storage system that 
system includes a plurality of servers capable of responding includes a plurality of disc drives. 

to each service request broadcast over the network by one of ^ 48 - ^ apparatus of claim 33, wherem each one of the 

the plurality of data processing systems. data processing systems includes: 

36. The apparatus of claim 34, wherein each of the data means for detecting a failure in the one of the data 
processing systems broadcasts a different type of service processing systems; 

request for different types of failures, and wherein the means for storing information identifying a nature of each 

monitoring system includes a plurality of servers capable of 45 failure; and 

responding to each type of service request broadcast over the means for, when a failure is detected, broadcasting a 

network by one of the plurality of data processing systems. service request over the network cloud to the monitor- 

37. The apparatus of claim 36, wherein the apparatus ing system indicating that the failure has occurred, 
includes means for selecting a one of the plurality of servers 49. The apparatus of claim 48, wherein each of the 
capable of responding to each type of service request most 50 plurality of data processing systems includes means for, 
efficiently. when a response is received from the monitoring system 

38. The apparatus of claim 33, wherein the monitoring indicating that the monitoring system is prepared to process 
system includes: the service request, transmitting from the one of the plurality 

means for periodically transmitting communications over of data processing systems to the monitoring system the 

the network cloud to each of the plurality of data 55 stored information indicating the nature of the failure, 

processing systems to determine whether each of the 50. The apparatus of claim 33, wherein each of the 

plurality of data processing systems is capable of plurality of data processing systems is a storage system that 

returning a responsive communication over the net- includes a plurality of disc drives. 

work cloud; and 51. The apparatus of claim 50, wherein each of the 

means for determining that the failure has occurred in one eo plurality of data processing systems has a service processor 

of the data processing systems when no responsive directly coupled to the network cloud, 

communication is received from the one of the plurality 52. An apparatus comprising: 

of data processing systems. a network cloud; 

39. The apparatus of claim 38, wherein each one of the a plurality of data processing systems installed in a 
data processing systems includes: 65 manufacture/test facility coupled to the network cloud, 

means for detecting a failure in the one of the data each one of the plurality of data processing systems 

processing systems; and executing a plurality of tests to test the functional 
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operation of the one of the plurality of data processing 
systems, each one of the plurality of tests generating a 
failure when the one of the plurality of data processing 
systems does not properly execute the one of the 
plurality of tests; and 
a monitoring system, coupled to the network cloud, to 
monitor the plurality of data processing systems to 
determine when any of the data processing systems 
experiences a failure, wherein the monitoring system 
detects a failure in one of the data processing systems 
based upon communications over the network cloud 
between the one of the plurality of data processing 
systems and the monitoring system. 

53. The apparatus of claim 52, further including means 
for, when one of the plurality of data processing systems 
experiences a failure, transferring information from the one 
of the plurality of data processing systems to the monitoring 
system through the network cloud, the information indicat- 
ing to the monitoring system that the one of the plurality of 
data processing systems has experienced the failure. 

54. The apparatus of claim 52, wherein each of the 
plurality of data processing systems has a service processor 
directly coupled to the network cloud. 

55. The apparatus of claim 52, wherein each one of the 
data processing systems includes: 

means for detecting a failure in the one of the data 

processing systems; 
means for storing information identifying a nature of each 

failure; and 

means for, when a failure is detected, broadcasting a 
service request over the network cloud to the monitor- 
ing system indicating that the failure has occurred. 

56. The apparatus of claim 55, wherein each of the 
plurality of data processing systems includes means for, 
when a response is received from the monitoring system 
indicating that the monitoring system is prepared to process 
the service request, transmitting from the one of the plurality 
of data processing systems to the monitoring system the 
stored information indicating the nature of the failure. 

57. The apparatus of claim 52, wherein the monitoring 
system includes a database, and wherein the apparatus 
further includes means for storing in the database informa- 
tion indicating the nature of the failures of each of the 
plurality of data processing systems in the manufacture/test 
facility. 

58. The apparatus of claim 52, wherein each of the 
plurality of data processing systems is a storage system that 
includes a plurality of disc drives. 

59. The apparatus of claim 58, wherein each of plurality 
of data processing systems has a service processor directly 
coupled to the network cloud. 

60. The apparatus of claim 52, wherein the network cloud 
includes the Internet. 

61. The apparatus of claim 52, wherein the monitoring 
system includes: 

means for periodically transmitting communications over 
the network cloud to each of the plurality of data 
processing systems to determine whether each of the 
plurality of data processing systems is capable of 
returning a responsive communication over the net- 
work cloud; and 

means for determining that the failure has occurred in one 
of the data processing systems when no responsive 
communication is received from the one of the plurality 
of data processing systems. 

62. The apparatus of claim 61, wherein each one of the 
data processing systems includes: 
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means for detecting a failure in the one of the data 
processing systems; and 

means for, when a failure is detected, broadcasting a 
service request over the network cloud to the monitor- 
ing system indicating that the failure has occurred. 

63. The apparatus of claim 52, wherein each one of the 
data processing systems includes: 

means for detecting a failure in the one of the data 
processing systems; and 

means for, when a failure is detected, broadcasting a 
service request over the network cloud to the monitor- 
ing system indicating that the failure has occurred. 

64. The apparatus of claim 52, wherein the monitoring 
system further includes polling means for periodically trans- 
mitting inquiries over the network cloud to each one of the 
plurality of data processing systems requesting information 
as to whether the one of the plurality of data processing 
systems has experienced a failure. 

65. The apparatus of claim 64, wherein each one of the 
data processing systems includes: 

means for detecting a failure in the one of the data 
processing systems; 

means for storing information indicating the nature of 
each failure; and 

means, responsive to the inquiries, for transmitting the 
stared information that indicates the nature of the 
failure over the network cloud to the monitoring sys- 
tem. 

66. A method of automatically downloading an updated a 
piece of software to a plurality of data processing systems, 
the plurality of data processing systems each being coupled 
to a service center, the method comprising steps of: 

(A) providing the updated piece of software on the service 
center; 

(B) periodically receiving service requests from each of 
the plurality of data processing systems, each service 
request including information from which a determi- 
nation can be made as to whether the data processing 
system that transmitted the request has a copy of the 
updated piece of software; 

(C) in response to the service requests, automatically 
determining which of the plurality of data processing 
systems do not have a copy of the updated piece of 
software; and 

(D) automatically downloading a copy of the updated 
piece of software to the data processing systems that do 
not have a copy of the updated piece of software. 

67. The method of claim 66, wherein the plurality of data 
processing systems is coupled to the service center via a 
network cloud, and wherein step (D) includes a step of 
automatically downloading a copy of the updated piece of 
software over the network cloud. 

68. The method of claim 66, wherein steps (B)-(D) 
execute automatically, without any operator participation. 

69. A method of using a monitoring system to monitor the 
status of a plurality of data processing systems in a 
manufacture/test facility, the method comprising steps of: 

(A) executing a plurality of tests on each of the plurality 
of data processing systems to test the functional opera- 
tion of the plurality of data processing systems, each 
one of the plurality of tests generating a failure when 
one of the plurality of data processing systems does not 
properly execute the one of the plurality of tests; 

(B) when a failing one of the plurality of data processing 
systems experiences a failure; 
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storing information in the failing one of the plurality of 
data processing systems identifying a nature of the 
failure; and 

broadcasting a service request from the failing one of 
the plurality of data processing systems to the moni- 
toring system, the service request indicating that the 
failure has occurred; and 
(C) storing information in the monitoring system to record 
the failure in response to information provided by the 
failing one of the plurality of data processing systems. 

70. The method of claim 69, wherein step (C) further 
includes steps of: 

tranmsitting a response from the monitoring system to the 
service request indicating that the monitoring system is 
prepared to process the service request; 

receiving, from the failing one of the plurality of data 
processing systems, the stored information indicating 
the nature of the failure; and 

storing information in the monitoring system to record the 
failure based upon the stored information indicating the 
nature of the failure. 

71. The method of claim 70, further including steps, 
performed by the monitoring system, of: 

periodically transmitting communications to each of plu- 
rality of data processing systems to determine whether 
each of the plurality of data processing systems is 
capable of returning a responsive communication over 
the network cloud; and 

when no responsive communication is received from one 
of the plurality of data processing systems, determining 
that a failure has occurred in the one of the data 
processing systems. 

72. A method of using a monitoring system to monitor the 
status of a plurality of data processing systems in a 
manufacture/test facility, the method comprising steps of: 

(A) executing a plurality of tests on each of the plurality 
of data processing systems to test the functional opera- 
tion of the plurality of data processing systems, each 
one of the plurality of tests generating a failure when 
one of the plurality of data processing systems does not 
properly execute the one of the plurality of tests; 

(B) periodically transmitting inquiries from the monitor- 
ing system to each of the plurality of data processing 
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systems requesting information as to whether the one of 
the plurality of data processing systems has experi- 
enced a failure; and 
(C) when a failing one of the plurality of data processing 
5 systems experiences a failure; 

storing information in the failing one of the plurality of 
data processing systems identifying a nature of the 
failure; and 

10 responding to one of the periodic inquiries by trans- 
mitting the stored information that indicates the 
nature of the failure to the monitoring system. 
73. A method of monitoring a plurality of data processing 
systems from a monitoring system to determine when any of 
!5 the data processing systems experiences a failure, the 
method comprising steps of: 

(A) coupling the plurality of data processing systems to 
the monitoring system via a network cloud; and 

(B) when one of the plurality of data processing systems 
experiences a failure, detecting the failure at the moni- 
toring system based upon communications over the 
network cloud between the one of the plurality of data 
processing systems and the monitoring system, 

^ wherein step (B) further includes steps of: 

(Bl) detecting the failure at the one of the data 
processing systems, and broadcasting a service 
request over the network cloud from the one of the 
plurality of data processing systems to the moni- 

30 toring system, the service request indicating that 

the failure has occurred in the one of the plurality 
of data processing systems; and 
(B2) periodically transmitting communications over 
the network cloud from the monitoring system to 

35 the one of the plurality of data processing systems 

to determine whether the one of the plurality of 
data processing systems is capable of returning a 
responsive communication over the network 
cloud, and when no responsive communication is 

^ received from the one of the plurality of data 

processing systems, determining that the failure 
has occurred in the one of the data processing 
systems. 

* * * * * 
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